Document Expansion using a Side Collection for Monolingual and Cross-language Spoken Document Retrieval
نویسندگان
چکیده
This paper presents a method of document expansion using a side collection for improving the overall performance in retrieving spoken documents using text queries. This method is applied to Chinese spoken document retrieval (SDR) tasks where a series of experiments have been carried out for both monolingual and cross-language SDR systems. In our monolingual retrieval experiments, Cantonese broadcast news documents are retrieved using a multi-scale syllable-based approach. Experimental results show that application of document expansion can achieve an improvement of 56% in average inverse rank (AIR). For the cross-language spoken document retrieval (CL-SDR) task where Mandarin broadcast news is retrieved using English textual queries, experimental results show that the use of document expansion has brought 14% relative improvement in retrieval performance.
منابع مشابه
Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval
This paper presents the application of document expansion using a side collection to a cross-language spoken document retrieval (CL-SDR) task to improve retrieval performance. Document expansion is applied to a series of EnglishMandarin CL-SDR experiments using selected retrieval models (probabilistic belief network, vector space model, and HMM-based retrieval model). English textual queries ar...
متن کاملCLEF-2005 CL-SR at Maryland: Document and Query Expansion using Side Collections and Thesauri
This paper reports results for the University of Maryland’s participation in CLEF-2005 Cross-Language Speech Retrieval track. Techniques that were tried include: (1) document expansion with manually created metadata (thesaurus keywords and segment summaries) from a large side collection, (2) query refinement with pseudo-relevance feedback, (3) keyword expansion with thesaurus synonyms, and (4) ...
متن کاملCross-Language Spoken Document Retrieval on the TREC SDR Collection
This paper presents preliminary experiments on crosslanguage spoken document retrieval (SDR) carried out on a benchmark assembled at ITC-irst. The benchmark is based on resources used in the last two spoken document retrieval tracks at the TREC conference, which are available on the Internet. They include automatic transcripts of American English broadcast news, short topics written in English,...
متن کاملIssues in pre- and post-translation document expansion: untranslatable cognates and missegmented words
Query expansion by pseudo-relevance feedback is a well-established technique in both monoand crosslingual information retrieval, enriching and disambiguating the typically terse queries provided by searchers. Comparable document-side expansion is a relatively more recent development motivated by error-prone transcription and translation processes in spoken document and cross-language retrieval....
متن کاملInformation fusion for monolingual and cross-language spoken document retrieval
of thesis entitled: Information fusion for monolingual and cross-language spoken document retrieval Submitted by LO Wai-Kit for the degree of Doctor of Philosophy at The Chinese University of Hong Kong in October 2002 Spoken document retrieval (SDR) is an important technique that enables relevant information to be searched from spoken data archives. With the advent of Internet and multimedia te...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003